Batteries included python package #310

jerinphilip · 2022-01-24T18:29:11Z

This PR imports python bindings from https://github.com/jerinphilip/lemonade along with a bit of python code that makes this repository a little easier to start with.

The additions bring about the following that is of value:

Ability to fetch and prepare models from model-repositories (translateLocally’s browsermt models, opus models as they share browsermt’s JSON).
Python integration boosts experimental capabilities and hopefully brings more community attention.

Summary of Changes

bindings/python folder containing python bindings and additional python code for batteries included package.
3rd_party/pybind11, new dependency.
doc/conf.py and doc/requirement.txt changes for packages (argparse docs).
doc/python.rst, wired to doc/index.rst to enable python documentation.
CMakeLists.txt changes to allow python to integrate into the existing source.
.github/workflows/python.yml imports workflows for building wheels on python.
.github/workflows/doc.yml job is moved into .github/workflows/python.yml to be ran after the wheel is built to tap into python documentation.

Python checks

All python code in this repository (including doc/conf.py) is made to adhere to the following:

black formatting.
isort with profile black.

In addition, since we lack serious tests a static type checker from google (pytype) is used to fish out stuff. Addition of a proper testing framework will only happen after this PR if at all. For now, command line is tested through GitHub CI for no error runs which should cover a decent chunk of code.

Where all does it work?

Currently supports Linux and at least one Mac (although adding more should be easier).

One build targets Colab (Could be useful for #207). MKL etc appears to be linked statically thus making it easy to distribute. The pre-built wheels have been tested to be smoothly running on:

Ubuntu 18.04 py3.6
Ubuntu 18.04 py3.7 (colab)
Ubuntu 20.04 py3.8
ArchLinux with py3.10 using the wheel built via Ubuntu.

@jerinphilip has had GLIBC version issues when trying Ubuntu 18.04 py3.8 with Ubuntu 20.04 built wheels. The general solution seems to be to go for a manylinux wheel built on a CentOS image. (Related: ugermann/ssplit-cpp#30). @jerinphilip is also squatting on pypi package name, for until the (manylinux) wheel is ready.

Fixes #234.

* Remove doc for moving into python * Include python.rst in toc * Use 18.04 cp3.7 wheel * Integrating python documentation

jerinphilip · 2022-01-24T23:20:44Z

Default is a merge by next Monday in whatever state the code is. Should anyone have the time and courage to - please do look at the 1500 line diff and leave comments. I expect to address critical stuff. Non-critical comments will still be addressed, hopefully as smaller changes behind this PR. There are known rough edges, but what is here is a minimum-viable product.

@jelmervdl finds the prospect of using python HTML experimentation and evaluations useful. Life would be easier with 2 repositories than the 3 (bergamot-translator, lemonade, tagtransfer) we're working with.

@kpu has given a no-objection for python bindings sitting somewhere not slowing down the library source and has also kindly observed that this also happens to be python bindings for "marian", which bergamot is built on top of.

macOS buildability and python target only builds are courtesy @jelmervdl. A lot of workflow code and model zoo stuff is thanks to XapaJIaMnu/translateLocally - @jelmervdl and @XapaJIaMnu.

jerinphilip

Some notes to self... and call for help.

jerinphilip · 2022-01-24T23:21:32Z

bindings/python/__init__.py

+                "browsermt", "https://translatelocally.com/models.json"
+            ),
+            TranslateLocallyLike(
+                "opus", "https://object.pouta.csc.fi/OPUS-MT-models/app/models.json"


Attribution...? Help on howto?

doc/conf.py

setup.py

3rd_party/CMakeLists.txt

XapaJIaMnu

Other than the bergamot.cpp file, I approve.

bindings/python/__init__.py

bindings/python/bergamot.cpp

XapaJIaMnu · 2022-01-25T07:43:02Z

bindings/python/repository.py

+        inventory = requests.get(self.url).text
+        with open(self.models_file_path, "w+") as models_file:
+            models_file.write(inventory)
+        self.data = json.loads(inventory)


I have updated the models.json and model_info.json to contain repository information. Eg: https://translatelocally.com/models.json
https://github.com/browsermt/students/blob/master/enfa/enfa.student.tiny11/model_info.json#L7

doc/conf.py

setup.py

MANIFEST.in

bindings/python/__init__.py

bindings/python/bergamot.cpp

bindings/python/cmds.py

jelmervdl · 2022-01-25T09:29:57Z

bindings/python/bergamot.cpp

+           py::arg("numWorkers") = 1, py::arg("cacheEnabled") = false,
+           py::arg("cacheSize") = 20000, py::arg("cacheMutexBuckets") = 1,


Does cacheSize = 20000 but cacheEnabled = false entail that it does allocate space for 20k entries, but never use it by default? Is cacheMutexBuckets=1 a good default if someone were to change the numWorkers?

So, w.r.t cacheMutexBuckets, I want to stop exposing them to the user, same as what translateLocally is doing. As for the cacheEnabled what do you think of wrapping cache in std::optional? It is also possible to set a per request enabled/disabled as well even if we've created the cache-store.

In any case, choosing to defer this for later.

So, w.r.t cacheMutexBuckets, I want to stop exposing them to the user, same as what translateLocally is doing.

I think this is a good idea. Like, there has to be a theoretical optimum value with respect to the number of threads and the cacheSize. bergamot-translator can figure out that relation. That shouldn't be up to the user.

As for the cacheEnabled what do you think of wrapping cache in std::optional? It is also possible to set a per request enabled/disabled as well even if we've created the cache-store.

My problem is that you allocate a 20.000 ptr vector by default, even when cacheEnabled is false. Easiest would be to just add a ternary operator here to set cacheSize to 0 and save a dynamic allocation for it when cacheEnabled is off.

Other options would be making cache a Ptr (so you can manage it on your own, like clearing it when switching models in translateLocally since you know that old entries wont be very relevant at that point, or keep it between AsyncService changes when changing number of workers or something) or keep it like it is now, but having cacheSize=0 shortcut all calls into it or something. Or indeed wrap it in a std::optional to save it from being initialised when cacheEnabled=0.

bindings/python/bergamot.cpp

jelmervdl · 2022-01-25T14:03:28Z

Tested it locally, and after applying the macOS patch it compiles and installs like a charm!

One thing I did notice as an annoyance while working in a jupyter notebook with it: You cannot recreate service due the logger issue (can't find the ticket for it anymore) where there can only be one instance of Logger at a time. This is annoying when re-executing a notebook cell. It works if you explicitly delete (i.e. del service) it before you initialize it.

edit: found it again! #290

…ment

Jerin Philip added 2 commits January 24, 2022 18:27

Import python binding sources

5fd468f

Add python yml file

54066c5

jerinphilip changed the title ~~Import python binding sources~~ Import python sources Jan 24, 2022

Jerin Philip added 4 commits January 24, 2022 18:54

pybind11, CmakeLists and buildability

73e1ba3

MacOS patches

ec62842

Adapting patch to skip one repository

53bf434

Skipping repository in CMakeLists.txt as well

b12561d

jerinphilip changed the title ~~Import python sources~~ Batteries included python package frontend Jan 24, 2022

jerinphilip changed the title ~~Batteries included python package frontend~~ Batteries included python package Jan 24, 2022

Connecting C++ and Python sphinx (#311)

4926734

* Remove doc for moving into python * Include python.rst in toc * Use 18.04 cp3.7 wheel * Integrating python documentation

jerinphilip marked this pull request as ready for review January 24, 2022 22:10

jerinphilip commented Jan 24, 2022

View reviewed changes

XapaJIaMnu previously approved these changes Jan 25, 2022

View reviewed changes

jelmervdl reviewed Jan 25, 2022

View reviewed changes

Jerin Philip added 4 commits January 25, 2022 14:54

Adding URL with github link to setup.py

6e6739a

setup.py LICENSE.txt to LICENSE

c153aa9

MANIFEST.in: LICENSE.txt -> LICENSE

da699f3

ServicePyAdapter::pivot - release GIL, redirect stdout/stderr

c51ae04

jerinphilip dismissed XapaJIaMnu’s stale review via c51ae04 January 25, 2022 15:01

Jerin Philip and others added 9 commits January 25, 2022 15:02

Add bindings/python to adhere to *bergamot* cpp style

8fa9894

__init__: Remove redundant try... except to bring code out

933d64a

clang-format bindings/python/bergamot.cpp

ba83d68

python files consistent styling - include doc/conf.py

8a078ff

personal github page testing artefact -> browser.mt/docs

55aca3d

option(COMPILE_PYTHON...) with some string documentation

6445a45

Prepare setup.py to pick PYTHON_LOCAL_VERSION_IDENTIFIER from environ…

0ebabf9

…ment

Set local identifier for non-tagged (semantic) versions

f8e42a8

jerinphilip merged commit c0f311a into main Jan 26, 2022

jerinphilip deleted the python branch January 26, 2022 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batteries included python package #310

Batteries included python package #310

jerinphilip commented Jan 24, 2022 •

edited

Loading

jerinphilip commented Jan 24, 2022 •

edited

Loading

jerinphilip left a comment

jerinphilip Jan 24, 2022

XapaJIaMnu left a comment

XapaJIaMnu Jan 25, 2022

jelmervdl Jan 25, 2022

jerinphilip Jan 26, 2022

jelmervdl Jan 26, 2022

jelmervdl commented Jan 25, 2022 •

edited

Loading

		py::arg("numWorkers") = 1, py::arg("cacheEnabled") = false,
		py::arg("cacheSize") = 20000, py::arg("cacheMutexBuckets") = 1,

Batteries included python package #310

Batteries included python package #310

Conversation

jerinphilip commented Jan 24, 2022 • edited Loading

Summary of Changes

Python checks

Where all does it work?

jerinphilip commented Jan 24, 2022 • edited Loading

jerinphilip left a comment

Choose a reason for hiding this comment

jerinphilip Jan 24, 2022

Choose a reason for hiding this comment

XapaJIaMnu left a comment

Choose a reason for hiding this comment

XapaJIaMnu Jan 25, 2022

Choose a reason for hiding this comment

jelmervdl Jan 25, 2022

Choose a reason for hiding this comment

jerinphilip Jan 26, 2022

Choose a reason for hiding this comment

jelmervdl Jan 26, 2022

Choose a reason for hiding this comment

jelmervdl commented Jan 25, 2022 • edited Loading

jerinphilip commented Jan 24, 2022 •

edited

Loading

jerinphilip commented Jan 24, 2022 •

edited

Loading

jelmervdl commented Jan 25, 2022 •

edited

Loading